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Abstract 

We analyze the Ensemble and Polynomial Chaos Kalman filters applied to nonlinear stationary Bayesian 
inverse problems. In a sequential data assimilation setting such stationary problems arise in each step of 
either filter. We give a new interpretation of the approximations produced by these two popular filters 
in the Bayesian context and prove that, in the limit of large ensemble or high polynomial degree, both 
methods yield approximations which converge to a well-defined random variable termed the analysis ran¬ 
dom variable. We then show that this analysis variable is more closely related to a specific linear Bayes 
estimator than to the solution of the associated Bayesian inverse problem given by the posterior measure. 
This suggests limited or at least guarded use of these generalized Kalman filter methods for the purpose 
of uncertainty quantification. 


1 Introduction 

Due to increasing attention to uncertainty quantification (UQ) for complex systems, in particular as relates to 
the study and solution of partial differential equations (PDEs) with random data, interest has also focussed 
on inverse problems for random PDEs. In particular, the Bayesian approach to inverse problems has become 
popular in this context. Erom a UQ perspective the inverse problem is of tremendous interest since incor¬ 
porating any available information into the probability law of an uncertain quantity will, in general, reduce 
uncertainty and lead to improved stochastic models. 

We consider in this work the fundamental task of inferring knowledge about an unknown element u G V 
from a separable Hilbert space X by observing finite-dimensional noisy data 

Z = G{u) + £, (1) 

where G : X denotes the known (and deterministic) parameter-to-solution map and s the observational 

noise. Adopting the Bayesian perspective, we assume a probability measure on X to be given describing 
our prior knowledge or belief about u which may be based, e.g., on physical reasoning, expert knowledge or 
previously collected data. We wish to highlight the distinction between the two main tasks associated with 
Bayesian inverse problems, namely identification and inference, where the latter may include the former. 
Identification refers to the task of determining an element u £ X which best explains the observed data 
z in accordance with given a priori assumptions, yielding a best guess or best single approximation to the 
unknown u. By inference we mean the gain in knowledge by merging a prior probabilistic model po with 
new information z G to obtain an updated model which represents the new understanding or belief 
about u. 
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This incorporation of new information is realized mathematically by conditioning the prior probability 
measure /tq on the event {G(u) + e = z} and is thus rooted in Kolmogorov’s fundamental concept of 
conditional expectation [31]. Bayes’ rule provides an analytic expression for the resulting conditioned or 
posterior distribution in terms of the prior distribution and provides the main tool in Bayesian inference and 
Bayesian inverse problems (BIPs). 

While BIPs enjoy a number of favorable theoretical properties compared with their deterministic counter¬ 
parts, i.e., they are well-posed and their solution in the form of the a posteriori measure is, in a certain sense, 
explicitly characterized, they do pose significant computational challenges in that they entail calculations 
with highly correlated and complex distributions in high-dimensional spaces. The primary “workhorse” here 
is the Markov Chain Monte Carlo (MCMC) method [15], whose continued improvement drives a very active 
field of research. However, MCMC simulations can be quife cosily, since fhe chain has fo run long enough 
lo give sufficienfly accurate eslimales and each iferafion typically requires one evaluation of fhe forward map 
G, e.g., one PDE solve. Thus, for online monitoring or conlrol of complex dynamical sysfems such as arise in 
weafher forecasling or oil reservoir managemenl, MCMC mefhods are prohibifively expensive, and filtering 
mefhods like fhe Kalman filler or fhe Ensemble Kalman filter are often applied lo fhe associafed sfafe or pa- 
ramefer esfimalion problem. Moreover, in dynamical sysfems where observalional dafa arrives sequenfially 
in lime, Kalman filler-lype mefhods provide fhe significanl advanlage lhal Iheir recursive slruclure is adapfed 
to Ihis sequential availability of dafa (see [38, Section 5.4] for a nice discussion of Ibis issue). So far, fhe 
Kalman filler (KE) [21] and ifs generalizations have mainly been used for slate estimation, i.e., for idenfifica- 
lion ralher lhan for Bayesian inference for quantifying uncerlainly. In recenf years, however, Ihese mefhods 
have drawn fhe aflenfion of fhe growing UQ community, e.g. [19, 18, 22], and are being increasingly applied 
also lo Bayesian inverse problems. The poinl of deparlure is typically fhe Ensemble Kalman filler (EnKE) 
[11], an extension of fhe KE fo nonlinear models of type (1). As an example of Ihis developmenf, fhe aulhors 
of [4, 30, 34, 33, 36, 35] have combined fhe idea of fhe EnKE wifh fhe compulafionally affraclive represenfa- 
lion of random variables in a polynomial chaos expansion lo develop an efficienl melhod for Bayesian inverse 
problems. In place of (deterministic) sfafe esfimalion, Ihese mefhods model fhe uncerlain slate as a random 
variable which is updaled wifh fhe arrival of each new sel of observalions. We will refer lo Ihis approach in fhe 
following as fhe Polynomial Chaos Kalman filler (PCKE). If was fhe sludy of Ihis new PCKE melhod which 
mofivaled Ihis work, because, allhough ifs aulhors gave a molivafion for deriving Iheir algorifhm, fhe random 
variable approximaled by fhe PCKE is nol clearly characterized. The same is Irue for fhe EnKE: Despile ifs 
many documented applications a defailed descripfion of fhe nalure or dislribulion of fhe analysis ensemble 
produced by one EnKE update is still lacking. Only occasional hinls lhal fhe EnKE generally fails to yield an 
ensemble dislribuled according lo fhe posterior measure can be found in fhe liferalure. 

The presenl work fills Ihis gap and clarifies fhe slochaslic model underlying fhe EnKE and PCKE. We 
defermine fhe precise quanlilies approximaled by fhe EnKE and PCKE and how Ihese approximalions re- 
lale lo fhe solufion of Bayesian inverse problems and Bayes eslimalors. In addition, we prove convergence 
resulfs for bolh mefhods in fhe limif of increasing “resolulion”, i.e., for large ensemble size for fhe EnKE 
and large polynomial degree for fhe PCKE, respecfively. The quesfion of convergence of fhe EnKE or PCKE 
is also missing in fhe liferalure so far. To fhe aulhors’ knowledge, fhe only relaled resull is [25], where fhe 
convergence of fhe EnKE applied fo dafa assimilafion in linear, dynamical sysfems was sludied. 

The remainder of Ihis paper is organized as follows: Section 2 briefly recalls fhe Bayesian approach to 
inverse problems as well as Bayes eslimalors. In Secfion 3 we describe and analyze fhe EnKE and PCKE. In 
parlicular, we prove lhal fhe approximations provided by Ihese generalized Kalman tillering mefhods converge 
lo a cerlain analysis random variable. A characlerizalion of Ihis analysis random variable in lighl of Bayes 
eslimalors is furlher given in Secfion 4 where we show lhal ifs dislribulion, in general, differs from Ihe 
desired posterior measure. Moreover, we illuslrale Ihe performance of fhe EnKE and PCKE and Ihe difference 
belween Iheir approximations and Ihe solution of fhe Bayesian inverse problem for a simple ID boundary 
value problem and a simple dynamical system in Section 5. Section 6 provides a summary and conclusion. 
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2 Bayesian Inverse Problems and Bayes Estimators 

In this section we introduce the basic concepts of the Bayesian approach to inverse problems. Throughout, 
let I • I denote the Euclidean norm on || • || the norm and (•, •) the inner product in a general separable 
Hilbert space X, and y a second separable Hilbert space. By C{X, y) we denote the set of all bounded linear 
operators A ■. X ^ y. Note that C{X^ y) is isometrically isomorphic to the tensor product of the Hilbert 
spaces X <Siy [24, 32]. 

In order to regularize the usually ill-posed least-squares formulation 

u = argmin \ z — G{v)\^ 

v&X 

of the inverse problem (1), one incorporates additional prior information about the desired u into the (de¬ 
terministic) identification problem by way of a regularization functional [9], R : X —)■ [0,cx)], and solves 
for 

Ua = argmin \z — G{v)\‘^ + a R(u), 

v&X 

where a G [0,oo) serves as a regularization parameter to be chosen wisely [1]. A further possibility for 
regularization is to restrict u to a subset or subspace X <Z X, e.g., by using a stronger norm of u as the 
regularization functional. Broadly speaking, the Bayesian approach may be viewed as yet another way of 
modelling prior information on u and adding it to the inverse problem. In this case we express our prior belief 
about u through a probability distribution itq on the Hilbert space X, by which a quantitative preference of 
some solutions u over others may be given by assigning higher and lower probabilities. However, the goal 
in the Bayesian approach is not the identification of one specific u G A, buf rafher inference on u, i.e., we 
would like fo learn from the data in a sfafisfical or probabilisfic sense by adjusting our prior belief pQ abouf 
u in accordance wifh fhe newly available dafa z. The fask of idenfificalion may also be achieved wifhin fhe 
Bayesian framework fhrough Bayes estimates and Bayes estimators, which are discussed in Section 2.3. 

In fhe Bayesian setting fhe deferminisfic model (1) becomes 

Z = G{U) + e, (2) 

where now e, and hence Z, are M'^-valued random variables. For fhe unknown random variable U wifh values 
in X and prior probabilify disfribufion /tq, we seek fhe posferior probabilify disfribufion given fhe available 
observations Z = z. Before giving a precise definifion of fhe posferior disfribufion we require some basic 
concepfs from probabilify fheory. 

2.1 Probability Measures and Random Variables 

Lef (n, P) denofe a probabilify space and B{X) fhe Borel cr-algebra of X generafed by fhe open sefs 
in X w.r.f. || • ||. A measurable mapping X : (12,2^) —)• {X,B{X)) is called a random variable (RV) and 
fhe measure Px := P o X~^, i.e., Px(A) = P(A“^(A)) for all A G B{X), defines fhe disfribufion of 
X as fhe push-forward measure of P under X. Conversely, given a probabilify measure p, on {X,B{X)), 
we mean by X ~ /x fhaf Px = P- Furfher, lef cr{X) C F denofe fhe cr-algebra generated by X, i.e., 
a{X) = {X-^{A)-. A£B{X)]. 

The Bochner space of p-infegrable A-valued RVs, i.e., fhe space of (equivalence classes of) RVs X : H —)• 
X such fhaf ||X(a;)pP(da;) < oo, is denoted by LP(fl, F,F; X) or simply U’{X) when fhe confexf is 
clear. 

An elemenf m G X is called fhe mean of a RV X if for any x G X fhere holds (x, m) = E[(x, X)]. 
Here and in fhe following E denofes fhe expecfafion operafor w.r.f. P. If X G L^(H, X, P; X) fhen ifs mean 
is given by fhe Bochner integral m = E[X] = X(a;) P(da;). A bilinear form G : X x X —^ E is 
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called the covariance Cov(X, 1") of two RVs X : Q ^ X and Y : 14 —> 3^ if it satisfies C{x, y) = 
E[(x,X - E[X]) {y,Y - E[y])] for all x e Y and y e Y, and we set Cov(X) := Cov(X,X). We shall 
also employ the identity Cov(X, y) = E[(X — E[X]) (S> (Y — E[y])] when convenient. The covariance 
Cov(X, Y) can also be defined equivalenfly as an operator C : Y ^ Y such fhaf (Cx, y) = C{x, y). We 
will mainly work wifh fhe laffer definilion in fhe following buf on occasion will also apply fhe tensor producf 
form E[(3f — E[X]) ® {Y — E[y])]. The definitions of mean and covariance extend to RVs wifh values in 
separable Banach spaces by considering fhe fopological duals of X and Y, respecfively. 

We also require fhe notion of disfance befween probabilify measures, one of which is given by fhe 
Hellinger metric dn' given fwo probabilify measures and y ,2 on fhe Hilberf space Y, if is defined as 


dnid'!, ^12) 



1/2 




where is a dominating measure of and fi 2 , = {gi + ^ 2 )!“^- Note fhaf fhe definition of fhe 

Hellinger mefric is independenf of fhe dominafing measure. Anofher mefric for probabilify measures which 
we will employ in fhe following is fhe Wasserstein metric 


dw{i^i,E 2 )-= sup / f{u)fii{du)- / f{u)y 2 {du) , 

Lip(/)<1 Jx Jx 

where fhe supremum is faken over all / : A —)• M which satisfy \f{u) — f{v)\ < ||u — u||. For relations of 
fhe Hellinger and Wassersfein mefrics fo ofher probabilify mefrics such as fofal variafion disfance, we refer to 
[16]. 

In fhe following, we will use upper case lafin letters such as X, Y, Z, U fo denote RVs on Hilberf spaces 
and lower case lafin leffers like x, y, z, u for elemenfs in fhese Hilberf spaces or realizafions of fhe associafed 
RVs, respecfively. Greek letters such as e, rj and ^ will be used fo denote RVs as well as fheir realizafions and 
y and u (wifh various subscripfs) will denofe measures on fhe Hilberf space Y and M'^, respecfively. 


2.2 Bayes’ Rule and the Posterior Measure 

Bayesian inference consisfs in updafing our prior knowledge on fhe unknown quanfify U, reflecfing a gain 
in knowledge due fo new observafions. The disfribufion of fhe RV U, characferized by fhe probabilities 
E(C/ G H) for H G B{Y), quantifies in stochastic terms our knowledge abouf fhe uncerfainfy associafed 
wifh U. When new information becomes available, such as knowing fhaf fhe even! Z = z has occurred, fhis 
is reflecfed in our quanfifafive description as fhe “condifional disfribufion of U given {Z = z}”, denofed 
F{U G B\Z = z). Unforfunafely, P(C/ G B\Z = z) cannof be defined in an elemenfary fashion when 
¥{Z = 2 ;) = 0, in which case fhe condifional disfribufion is defined by an infegral relafion. The key concepf 
here is fhaf of conditional expectation: Given RVs X G L^{El,F,¥]Y) and Y : 14 —)• y, we define fhe 
condifional expecfafion E[3 l |y] of X given Y as any (T(y) -measurable mapping E[X|y] : 14 —)• Y which 
satisfies 

[ E[X\Y] P(dw) = [ X E(da;) VA G cT(y). 

Ja J a 

By fhe Doob-Dynkin Lemma [20, Lemma 1.13] fhere exisfs a measurable function 4> : Y ^ Y such fhaf 
E[X|y] = 4>{Y) E-almosf surely. We note fhaf fhis does nol determine a unique function (j) but rather an 
equivalence class of measurable functions, where cpi ~ (j )2 iff P(y G {y G y : 7 ^ </’ 2 (y)}) = 0. For a 

specific realization y of y (and a specific iji), we define 


E[X|y = y] := 4>{y) G A. 
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Setting X = one can then define for each fixed B G B{X) 

F{U £ B\Z = z) := E[1 {u^b}\Z = z] (3) 

as an equivalence class of measurable funcfions —)■ [0,1]. One would like fo view fhis, conversely, as a 
family of probability measures with the realization z as a parameter, giving the posterior distribution of U 
resulting from having made the observation Z = z. Unfortunately, this construction need not, in general, 
yield a probability measure for each fixed value of z (cf. [31]). In case is a separable Hilbert space, a 
function 

Q : B{X) X ^ M 

can be shown to exist (cf. [31]) such that 

(a) . For each z G M'^, Q(-, z) is a probability measure on {X, B{X)). 

(b) . For each B £ B{X) the function 

9 z i-G Q{B,z) 

is a representative of the equivalence class (3), i.e., it is measurable and there holds 

F{U £ B,Z £ A) = [ Q{B,z)Fzidz) G H(M'^). 

Ja 

Such a function Q, also denoted by is called the regular conditional distribution of U given Z and is 
defined uniquely up to sets of z-values of P^-measure zero. We have thus arrived at a consistent definition of 
the posterior probability P([/ G B\Z = z) as nu^ziB, z). 

It is helpful to maintain a clear distinction between conditional and posterior quantities: the former con¬ 
tain the - as yet unrealized - observation as a parameter, while in the latter the observation has been made. 
Specifically, pu\z is the conditional measure of U conditioned on Z, whereas puizi'i denotes the posterior 
measure of U for the observation Z = z. 

We now recall how Bayes’ rule yields an explicit expression for the regular conditional distribution pu\z- 
To this end, we make the following assumptions for the model (2). 

Assumption 1. 

1. f7 ~ //o, e ~ and {U, e) ~ /tq <8 ) t'e, Le., U and e are independent. 

2. = p{e) de where p{e) = with C > 0 and I : —?■ measurable and nonnegative. Flere 

de denotes Lebesgue measure on W^. 

3. G : X ^ is continuous. 

By Assumption 1, the distribution uz of Z in (2) is determined as uz = Cy{z)(iz where C > 0 and 

y{z) := f 

Jx 

We note that y{z) > 0 is well-defined since 0 < |e <^(“))| < 1 and 7 G L^(M'^) due to Fubini’s theorem 
[20, Theorem 1.27]. In particular, we have that (U, Z) ^ p with p{du, dz) = ^o(du) (g) dz 

where dz again denotes Lebesgue measure on W^. Further, we introduce the potential 

<h(u; z) := l{z — G{u)), 

for which we assume the following Lipschitz-like property: 
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Assumption 2. The potential is continuous in z in mean-square sense w.r.t. po, i.e, there exists a nonde¬ 
creasing function ip : [0, oo) —>• [0, oo) with lirng^o V’('S) = V'(O) = 0 such that 


E[\^{U;zi)-^{U;z 2)\‘^]= f \^{u; zi) - Z2)\'^ poidu) < f{\zi - Z2\). 


Jx 

For instance, there may exist a function 9 G B{X), pq-, M) such that 


zi) - $(tt; Z 2 )\ < 9{u) f{\zi - Z 2 \). 


Before stating the abstract version of Bayes’ Rule in Theorem 3, we recall the finite-dimensional case 


A’ ~ where it can be stated in terms of densities: here qo{du) = TTo{u)du, and Bayes’ rule takes the form 



where e = e represents the likelihood of observing z when fixing u. The denominator y{z) 


can be interpreted as a normalizing constant to ensure 7r^{u) du = 1. We now show that, in the general 
setting, Bayes’ rule yields (a version of) the (regular) conditional measure p,u\z of U w.r.t. Z. The statement 
of Theorem 3 differs from related results in [38, Theorem 4.2 and 6.31] insofar as we explicitly characterize 
the posterior measure as a version of the regular conditional distribution and as we allow also for a general 
prior pq and log-likelihood £. 

Theorem 3. Let Assumptions 1 and 2 be satisfied and define for each z ^ a probability measure on 


{X,B{X))by 



(4) 


Then the mapping Q : ;B(A’) x —>■ [0,1] given by 

Q{B,z) := p^{B) yBeB{X) 


is a regular conditional distribution of U given Z. ’We call the posterior measure (of U given Z = z). 
Moreover, depends continuously on z w.r.t. the Hellinger metric, i.e.,foranyzi,Z2 G with \zi — Z2\ < r 
there holds 


dnip'^fp''^) < Cr{zi)f{\zi - Z2\) 


where Cr{zi) = (7(1 -|- min{ 7 ( 2 ;') : \zi — z'\ < r}^) ^ < -t-oo. 

Proof. Continuity with respect to the Hellinger metric is a slight generalization of [38, Theorem 4.2] and may 


be proved in the same way with obvious modifications. To show that (5 is a regular conditional distribution we 


verify the two properties (a) and (b). The first follows from the construction of p^. For the second property, 
note that measurability follows from continuity. The continuity of p^ w.r.t. z in the Hellinger metric implies 
also that p^{B) depends continuously on z due to the relations between Hellinger metric and total variation 
distance (see [16]). Finally, we have for any A G B{M.‘^) and B G B{X) that 





■Jaxb JaJb 



which completes the proof. 


□ 
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Remark 4. Theorem 3 shows that the Lipschitz-like property of the potential stated in Assumption 2 carries 
over to the posterior for a general prior p,Q and an additive error e with Lebesgue density proportional to 
Roughly speaking, the negative log-likelihood t and the posterior share the same local modulus of 
continuity. 

By Theorem 3 the Bayesian inverse problem is well-posed under mild conditions. It is also possible to 
prove continuity of w.r.t. to the forward map G, see [38, Section 4.4], which is crucial when the forward 
map G is realized by numerical approximation. 

To give meaning to the mean and covariance of U ^ po and Z = G(C/) + e, we make the further 
assumption that all second moments exist: 

Assumption 5. There holds 

/ ( ll^lP + \G{u)\‘^ ) po{du) < +CX) and / \e\^ i^£(de) < +cx). 

Jx JRd 


2.3 Bayes Estimators 


Although the posterior measure p^ is, by definition, the solution to the Bayesian inverse problem, it is by no 
means easy to compute in practice. In special cases, such as when G is linear and po and are Gaussian 
measures, or in the case of conjugate priors, closed-form expressions for p^ are available. In general, however, 
p^ can only be computed in an approximate sense. Moreover, when the dimension of X is large or infinite, 
visualizing, exploring or using p^ for post-processing are demanding tasks. 

More accessible quantities from Bayesian statistics [3], which are also closer in nature to the result of 
deterministic parameter identification procedures than the posterior measure, are point estimates for the un¬ 
known u. In the Bayesian setting a point estimate is a “best guess” uofu based on posterior knowledge. Here 
“best” is determined by a cost function c : X ^ Mq" satisfying c(0) = 0 and c{u) < c(Au) for any u £ X 
and A > 1. This cost function describes the loss or costs c{u — u) incurred when u is substituted for (the true) 
u for post-processing or decision making. Also more general forms of a cost function are possible, see, e.g., 
[2, 3]. 


For any realization G ]R“ of the observation RV Z we introduce the (posterior) Bayes cost of the estimate 
u w.r.t. c as 

Bc{ic, z) := / c{u — u) p^(du), 

Jx 

and define the Bayes estimate ft as a minimizer of this cost, i.e.. 


u := argminBc(w; 2:), 
vex 

assuming a unique minimizer exists. The Bayes estimator —)• A" is then the mapping which assigns to 

an observation 2 the associated Bayes estimate u, i.e., 

cj) : z argminBc(r’; z). 
vex 


We assume measurability of f in the following and note that f is then also the minimizer of the expected or 
prior Bayes cost 


Bcif) :=E[Bc{(l){Z); Z)] = / c{u - fiz)) p^{du) uz{dz) = E[c{U - 4>{Z))], 

Jx 

i.e., for any other measurable (/> : —)■ Af there holds 


E 


ciU-f{Z)) <E[ciU-<PiZ))]. 
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Remark 6. Since (p = argmin^ E [c{U — (p{Z))] it is possible to determine the estimator cp, and hence also 
the estimate u = (p{z)for a given z, without actually computing the posterior measure as the integration 
in Bc((?!)) is carried out w.r.t. the prior measure. Therefore, Bayes estimators are typically easier to compute 
or approximate than p,^. 

We now introduce two very common Bayes estimators: the posterior mean estimator and the maximum a 
posteriori estimator. 

2.3.1 Posterior Mean Estimator 

For the cost function c{u) = ||u|p the posterior Bayes cost 

Bc{u]z)= / ||m —-ull^/i^(du) 

Jx 

is minimized by the posterior mean u = uqm ■= J;^;U p^{du), since for any Hilbert space-valued RV X its 
expectation E[X] is the minimizer of the functional Jx{v) = E[||X — up], v ^ X. The corresponding Bayes 
estimator for c{u) = jjrip is then given by 

fcM{z) := / u/i^(dM). 

Jx 

In particular, fcuiZ) = E[C/|Z] holds E-almost surely. 

Remark 1. If X is only a Banach space then the expectation of an X-valued RV X need not minimize the 
functional Jx, i-e., we have in general 

E[X] p argminE[||X — up]. 

v&X 

Ai a simple counterexample, consider X = jjujj = Juij + ju 2 l and X = {Xi^Xf) with independent 
random variables Xi, X 2 such that 

P(Xi = -1) =pi, E(Xi = 1) = I - Pi andF{X 2 = -1) = P 2 , ^{^2 = 1) = I - P 2 - 

Here E[X] minimizes E[jjX — ujj^] ijfpi = P 2 = 0.5. In fact, one can show E[X] = argmin^ g;,E[llX-ull2] 
if X is distributed symmetrically w.r.t its mean, i.e., if there holds P(X — E[X] G A) = P(E[X] — X G A) 
for all A £ B{X). 


2.3.2 Maximum A Posteriori Estimator 

Another common estimator in Bayesian statistics is the maximum a posteriori (MAP) estimator 0 map- For 
finite-dimensional A ~ and absolutely continuous prior pQ, i.e., po{du) = 7ro(n)du, the MAP estimate 
is defined as 

<^MAp(^) = argminT'(u;z) - log7ro(u) 

•uSK" 

provided fhe minimum exisfs for all z G M'^. For fhe definition of fhe MAP esfimafe via a cosl funcfion 
and fhe Bayes cosf, we refer fo fhe liferafure, e.g., [23, Section 16.2] or fhe very recenf work [5] for a novel 
approach; for MAP esfimafes in infinite dimensions, we refer fo [8]. 

There is an inferesfing link befween fhe Bayes esfimator 0 map and fhe solufion of fhe associafed regular¬ 
ized leasf-squares problem: If R : —)■ [0, 00 ) is a regularizing functional which satisfies exp(—^ R(u)) du < 

- 1 - 00 , fhen fhe solufion Ua = argmin^ \z — G{u)\‘^ -\- aR(u) coincides wifh fhe MAP esfimafe (pMAv{z) for 
e ~ N{0,a'^I) and po{du) oc exp(—^ R(u)) du. 
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3 Analysis of Generalized Kalman Filters 

In this section we consider Kalman filters and their application to the nonlinear Bayesian inverse problem 
(2). We begin with the classical Kalman filter for state estimation in linear dynamics and then consider two 
generalizations to the nonlinear setting which have been recently proposed for UQ in connection with inverse 
problems. We show that both methods can be understood as different discretizations of an updating scheme 
for a certain RV and prove that both Kalman filter methods converge to this RV when the discretization is 
refined. 

3.1 The Kalman Filter 

The Kalman filler [21] is a well-known melhod for sequenlial slale eslimalion for incomplelely observable, 
linear discrele-lime dynamics 

I/n — f^nUn—1 T — GnUn £ni ^ — 1)2,..., (5) 

where {Un)neN denotes fhe unknown, unobservable slafe and {Zn)n&N the observable process. The operators 
An and Gn are linear mappings in slale space and from slale to observalion space, respeclively, and Ihe noise 
processes ry„, are usually assumed to have zero mean wilh known covariances. In addilion, Ihe mean and 
covariance of Uq need to be known and Ihe RVs Uq, rjn, £n are laken to be mulually independent Then, 
given observalions Zi = zi,... ,Zn = Zn, the Kalman filter yields recursive equations for the minimum 
variance estimates of Un and their error covariances Cov{Un — Un), see, e.g., [7, 37] for an introduction 
and discussion. 

Although the main advantage of the Kalman filter is its recursive structure, making it very efficient for state 
estimation in dynamical systems with sequentially arriving data, a detailed analysis of sequential methods is 
beyond the scope of this work. We focus instead on the application of the Kalman filter and its generalizations 
to time-independent systems of the form (2) and, in the linear case, 

Z = GU + £, (17, e) ~ ^0 ® (6) 

We note that (6) can be seen as one step of the dynamical system (5) for An = I, rjn ^ 0 and Gn = G. 
Conversely, the state estimation problem for U = Uq, U = Un or U = {Uq, Ui ,..., Un) in (5) given 
Z = {Zi ,..., Zn) = {zi,..., Zn) = z can be reformulated as (6). 

If fio = IE [17] is taken as an initial estimate for the unkown U in (6) before observing Z = z, this results 
in the initial error covariance Cov(17 — uq) = Cov{U) =: Cq. Given data Z = z, the Kalman filter provides 
a new estimate ui and its error covariance Gi = Cov{U — ui) via the updates 

ui=uo + K{z-Guo), Gi = Go-KGGo, (7) 

where K = GqG*{GGqG* + S = Cov(e), is known as the Kalman gain. In fact, by assimilating the 

data Z = z the Kalman filter produces an improved estimate, since its expected error is smaller than that of 
the initial estimate in the sense that Cq — Gi is positive definite. 

If (17, Z) are jointly Gaussian RV, i.e., U ~ N{mo, Go) and e ~ A^(0, S), the posterior measure of U 
given Z = z also has a Gaussian distribution ~ N{m^, G^) with 

= mo + K{z — Gmo), G^ = Go — KGGo, 

see, e.g., [26]. Thus for G linear and U, e independently Gaussian, the Kalman filter is seen to yield the so¬ 
lution of the Bayesian inverse problem by providing the posterior mean and covariance, which in this case 
also uniquely specify the Gaussian posterior measure However, we emphasize that the Kalman filter does 
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not directly approximate the posterior measure, it rather provides minimum variance estimates and their error 
covariances for linear problems (5). Without the assumption that //q or are Gaussian the Kalman filter will 
not, in general, yield the first two posterior moments, nor is the posterior measure necessarily Gaussian. 

In the following two subsections we consider generalizations of the Kalman filter to nonlinear problems (2). 
The historically first such method was the extended Kalman filter (EKF), which is based on local lineariza¬ 
tions of the nonlinear map G, but which we will not consider here. We rather focus on the Ensemble Kalman 
Eilter (EnKE) introduced by Evensen [11] and the recently developed the Polynomial Chaos Kalman Eilter 
(PCKE). 

3.2 The Ensemble Kalman Filter 

Since its introduction in 1994, the EnKE has been investigated and evaluated in many publications [12, 6, 14, 
13, 28]. However, the focus is usually on its application to state or parameter estimation rather than solving 
Bayesian inverse problems. Recently, the interest in the EnKE for UQ in inverse problems has increased, see, 
e.g., [18, 19, 22]. 

If we consider the model Z = G{U) + e with ([/, e) ~ ® Ve and given observations 2 : G the EnKE 

algorithm proceeds as follows: 

1. Initial ensemble: Draw samples ui,, um of [/ ~ /Uq- 

2. Forecast: Draw samples ei,..., £m of e ~ set 

Zj = G{uj) + £j, j = l,...,M, 
yielding samples zi,... ,zm Z ~ vz- 

3. Analysis: Update the inital ensemble u = {ui, ..., um) member by member via 

= Uj + k{z- Zj), j = 1,...,M, ( 8 ) 

where K = Cov{u, z)Coy{z)~^ and Cov(it, z) and Cov(z) are the empirical covariances of the 
samples u and z = (zi,..., zm), e.g.. 


Cov{u, z) = ^ ^{Uj -u)® {Zj - z), 

i=i 

where u = + ■ ■ ■ + um) and z = + ■ ■ ■ + zm)- This yields an analysis ensemble 

= (ui,..., u^) which in turn determines an empirical analysis measure 


M 


Em — 


M 




i=i 


(9) 


where 5u°: denotes the Dirac-measure at the point rt“. Moreover, the empirical mean of serves as an 
estimate u for the unknown u and the empirical covariance of as an indicator for the accuracy of the 
estimate. 


For dynamical systems such as (5), the analysis ensemble would be propagated by the system dynamics 
and would then serve as the initial ensemble for the subsequent step n. 
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3.3 The Polynomial Chaos Kalman Filter 

In [4, 30, 34, 33, 36, 35] the authors propose a sampling-free Kalman filtering scheme for nonlinear systems. 
Rather than updating samples of the unknown, this is carried out for the coefficient vector of a polynomial 
chaos expansion (PCE) of the unknown. This necessitates the construction of a PCE distributed according 
to the prior measure /lo- we assume there exist countably many independent real-valued random variables 
^ = (^m)meN> and chaos coefficients Ua G Af, Sq, G for each multi-index 

Q G J := {q G Nq : aj ^ 0 for only finitely many j}, 

such that 

^||Ua|P <+00 and [ggp < +00; 

ctSj ctSj 

and 

~ /UO ® 

«eJ aeJ 

Here, Pa{^) = Y[m>i denotes the product of univariate orthogonal polynomials where 

we require to be a CONS in L^(M, H(M), M). We note that the completeness of orthogonal 

polynomials will depend in general on properties of the measure , see [ 1 0] for a complete characterization. 

We then define U := Yla&s'^ciPcx{^) and e := gi'^on the chaos coefficients (rtQ:)cieJ 

and (eQ;)cieJ- However, for numerical simulations we have to truncate the PCE and, therefore, introduce the 
projection 

F j U UaPai^), J cl. 

ocGJ 

To simplify notation we further define for J C JJ the following two RVs 

Uj := Pj U and Zj := Pj{G{Uj) + e). 


Due to the nonlinearity of G there holds in general Pj G{U) / G'(Pj U) / Pj G{Uj), and, hence, Zj / 
Fj Z\ In particular, we will consider finite subsets J, and for convergence studies we usually assume a 
monotone and exhaustive sequence of such finite subsets {Jn)neN-: i■e^ Jm C for m < n and Jn t e.g., 

^ OO 

:= < Q G J : aj = 0 Vy > n, \ aj\ < n 
^ i=i 

We note that for n —)■ oo the error ||f7 — Uj„\\l 2 (x) will tend to zero since t J- However, the L^- 
convergence is in general not preserved under continuous mappings (unlike convergence in the almost sure 
sense, in probability and in distribution). Thus, although there holds \\U — Uj^\\i2(^;y) 0 of course, 

\\G{U) - Pj„ G{U)\\L 2 (^d) 0, the continuity of G does not imply \\G{U) - Pj„ GiUjJWl^^^a) 0 in 

general. However, if we assume for a (i > 0 that there exists C < -|-oo such that 


E 




< G 


Vn G N, 


( 10 ) 


the desired convergence of ||Z — Zj^ ||l 2(R‘^) 0 follows, see the proof of Theorem 9 for details. 

Eor the same problem considered for the EnKE, the PCKE algorithm now reads as follows: 


1. Initialization: Choose a finite subset J C J and compute the chaos coefficients {ua)a&j of f7 ~ fiQ. 
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2. Forecast: Compute the chaos coefficients {gj^a)aej of G{Uj) and set 

^J,cx. •— gj,a “h ^OL '^CX G «/, 

where {ea)aeJ the chaos coefficients of e. By linearity zj^a. are the chaos coefficients of Zj. 

3. Analysis: Update the inital chaos coefficients by 

Ujcf^ .— 'Uq, -|- Kj ^j,ol) \Iol £ </, (11) 

where Jqo is the Kronecker symbol for multi-indices, {5oiQz)a^j = ( 2 ,0,..., 0) the chaos coefficients 
of the observed data 2; G and Kj := Cov(f7j, Zj)Cov(Zj)“t. The action of the covariances as 
linear operators can be described in the case of Cov{Uj, Zj) : —)• A by 

Cov(C/j, Zj)x = EE zJf^XUa, X G 

aeJ /3eJ 

Thus, the result of one step of the PCKF algorithm is an analysis chaos coefficient vector (ri“ )ctgj, which in 
turn determines a RV 

aGJ 

Remark 8. An expansion in polynomials Pa.{$,) not crucial for the application of the PCKF. In principle, 
any countable CONS (^'q)o of the space M) such that ( ~ 

/To ® would be suitable. 

3.4 The Analysis Variable 

Both EnKF and PCKF perform discretized versions of an update for RVs, namely, 

= U + K{z- Z), iT = Cov(f7, Z)Cov(Z)-\ (12) 

where Z := G{U) + e and (U, e) ~ /tq ® Ve, providing samples or chaos coefficients of (7“, respec¬ 
tively. However, the output of both methods is corrupted by the approximation of the Kalman gain operator 
K by the empirical covariances and the operator Kj, respectively. That both methods do indeed converge to 
[/“ in some sense for increasing sample size M or increasing chaos coefficient subset Jn is shown by the next 
two theorems. 

Theorem 9. Consider the model (2) and let Assumptions 1, 2 and 5 be satisfied. If (Jn)neN P n monotone 
and exhaustive sequence of finite subsets of I with 0 G Ji such that (10) holds, then \\Z — Zj^\\i^ 2 (^-^d'^ —?■ 0 
for n —>■ cx). Moreover, if 

UX = E 

a&Jn 

denotes the RV generated by the PCKF in the analysis step for the subset J = Jn, we have 

\\U^ - e O {\\U - + HZ - , 

which means in particular that Uj^ —)• in L?‘{X) as n —)• cx). 


(13) 
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Proof. In the following we use || • \\i 2 as shorthand for || • and || • respectively. Since (Jn)neN is 

O P P 

exhaustive, we have Uj^ —)• U in and hence Uj^ —> where —> denotes convergence in probability. 

Since G is continuous, it follows by the continuous mapping theorem [20, Lemma 3.3] that also G{Uj^) —> 
G{U). Now the boundedness assumption (10) implies the uniform integrability of the RVs |G(C/j„)p, n G N, 
see [20, p. 44], and by [20, Proposition 3.12] we then obtain G{Uj^) —)■ G{U) in Thus, 


< 11-^ 


Pj„ + II Pj„(z - G(C/jJ - e)\\L 2 ^ 0. 

^0 <||G(t/)-G(t/j„)||^ 2^0 


Now consider J as an arbitrary subset. Since = U + K{z — Z) and Uj = Uj + Kj{z — Zj), we have 

||i7“-C/J||i2 < ||C/-i7j||i2 + ||iT-iTj|| ||z-Zj||i2 + ||iT|| ||Z-Zj||i2, 

where the norm for K and K — Kj is the usual operator norm for linear mappings from —)• X. It is clear 

that we can estimate 

11^ “ -^jIIl^ < kl + 11-^11^2, 

because ||Zj ||£2 < ||.^||£, 2 . Considering ||iT — Kj\\, we can further split this error into 

||7^-iTj|| < ||Cov(C/,Z)-Cov([/j,Zj)|| ||Cov-^(Z)|| 

+ ||Cov(C/j,Zj)|| ||Cov-H^)-Cov-i(^j)||. 


Next, we recall that the covariance operator Cov(X, Y) depends continuously on X and Y, in particular we 
have for zero-mean Hilbert space-valued RV Xi,X 2 G A{X) and li, I 2 G A{y) 


||Cov(Xi,yi)-Cov(X2,y2)|| = ||E[yi®yi]-E[y2®i"2]|| 

< E[||(Xi - X2) ® yill + 11x2 ® (Xi - y 2 )||] 

= EOIX 1 -X 2 II ||yi||]+E[||X2|| ||yi-y2||] 

< (||yi||i2 + 11 X 2 IIL 2 ) (||Xi -X 2 IIL 2 + ||yi - y2||i2), 


where we have used Jensen’s and the triangle inequality in the second line and the Cauchy-Schwartz inequal¬ 
ity in the last line. Since Cov(X, y) = Cov(X — E[X], Y — E[y]) and ||X — E[X] ||£,2 < ||X||j;^2 the above 
estimate holds also for non-zero-mean RVs. Thus, we get 

\\Cov{u, Z) - Cov{Uj,Zj)\\ < {\\uh 2 + \\z\\l 2 ) {\\u - Uj\\l 2 + ||y - Zj\\l 2 ) 


and 

||Cov(y) - Cov(Zj)|| < 4||y||i2 I|y - Zj\\l 2 , 

due to ||Zj ||^2 < 11^11^2. Now consider again the assumed monotone and exhaustive sequence (Jn)neN 
recall that, by taking a sufficiently large n, the error \\U — Uj^ \\i 2 and ||y — Zj^ \\]^2 can be made arbitrarily 
small. Thus, also ||Cov(Z) — Cov(yj)|| will tend to zero as re —)• 00 . We now apply now the continuity of 
the matrix inverses of Cov(y), Cov(Zj„) G Specifically, if re is sufficiently large that 


||Cov(Z)-Cov(ZjJ|| < 


1 

2||Cov-H^)||’ 


then there holds 


llCov-i(y) - Cov-\ZjJ\\ < 2||Cov-ny)f ||Cov(y) - Cov(yjJ|| 
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(see [17, Sect. 5.8]). Summing up all previous estimates, we obtain 

\\K-KjJ < Cii\\U-UjjL2 + \\Z-ZjjL2) + C2\\Z-ZjjL2, 
with Cl = ||Cov“^(Z)||(||t/||i2 + ||.Z'||i;^2) and C2 = 8||?7||/,2 ||Cov“^(Z)|p ||^||^2 where we have used 

||Cov(t/j,Zj)|| < \\Ujh2 \\ZjU2 < \\Uh2\\Zh2 

to obtain C 2 . Finally, we arrive at 

||C“-Cl||i2 < \\U-UjjL2 + i\z\ + \\Zh2)\\K-KjJ + \\K\\\\Z-ZjjL2 


< C{\\U-UjJl2 + \\Z-ZjJl2) 


with C = 1 + ||iF|| + \z\ + \\Z\\j ^2 + Cl + C 2 , and the assertion follows. 


□ 


Remark 10. Since for many applications evaluating the forward map G corresponds to solving a differential 
or integral equation, an additional error arises due to numerical approximations Gh ofG. This error affects 
the filters again by instead sampling or computing chaos coefficients of Z^ = Gh{U) + e than Z. We neglect 
this error in our analysis since it is bayond the scope of this work. However, if G is the solution operator for 
differential equations, we expect that (10) could be verified in many cases, such as for elliptic boundary value 
problems with U a random diffusion coefficient or source term. 

A first convergence analysis for the EnKF when the sample size tends to infinity was carried out in 
[25]. There the authors considered finite-dimensional linear systems, and their main goal was to show the 
convergence of the ensemble mean and covariance to the true posterior mean and covariance in L^’(M”) and 
2 ^p(]^nxn), respectively. We will now show the P-almost sure convergence of the empirical distribution fZfj 
defined by fhe EnKF analysis ensemble fo fhe disfribufion of ~ 

Theorem 11. Given the model (2) under Assumptions 1, 2 and 5, let (n“,... denote the analysis 
ensemble resulting from the EnKF and the associated empirical measure (9). Further, let pA denote the 

push-forward measure of the analysis variable f7“. Then, for any f : X ^ y which satisfies 


\\f{u) - f{v)\\y < C(1 + ||m||a’ + I^IIa^) Ik - vWx 


Vrt, V € X 


where y is any separable Hilbert space, there holds 



This implies, in particular. 



and 



as well as 



Proof. We denofe by Ui and Si, i G N, i.i.d. RV such fhaf {Ui, ef) ~ /io ® Ve- Furfher, we define 
Uf := Ui + K{z - Zi), K = Cov(Ci,Zi)Cov-k^i), 
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where Zi := G{Ui) + Si, and 

:= + Km{z - Zi), Km = Cov(J7m, Zm)Cov-H^m), 

where Cov{U m, Zm) and Cov(Zm) are empirical covariances, e.g., 

1 ^ 

Cov((7m, Zm) = - Um) ® (Zi - Zm) 

i=l 


with Um = -+ Um) and Zm = m(-^i H-+ Zm)- Note that {Xf,X^^) represents the 

random analysis ensemble of the EnKF and that C/f ~ i.i.d. For any function f : X ^ y which fulfills 
the assumptions stated in the theorem, we have 

M M M 

JfT,f(Xt,.,) = mE/W) 

2 = 1 2=1 2=1 

where there holds 

1 " r 

aI™ 'P-a-s- 

M^oo IVl ^^ /v 

i=l 

due to the strong law of large numbers (SLLN) [29]. Hence, we need only ensure that 


M M 

^ E - fm < ^ E + ii^“ii + w^kiW) ii^M, - u^ 


2=1 


2=1 


< - 
- I M 


M 


1/2 


E(^ + + \\Z^M,i 


2=1 


M 


M 


1/2 


E - u, 


a II 2 


2=1 


converges P-a.s. to 0 as M —)• oo to prove the first statement. We estimate 


ZM,i - UtW < \\K - Km\\\\z - z,\\ VfGN, 


where we can further split 

K-Km = {Coy{U,Z)-Coy{Um.Zm))Coy-^{Z) 

+Coy{Um,Zm) {Coy-\Z) - Cov-I(Zm)) • 


Next, we recall that the empirical covariance converges P-almost surely to the true covariance which follows 
easily (see [27, Satz 3.14] for the scalar case) by writing 

Cov(f7M, Zm) = J2^Ui - E[f7]) ® {Zi - E[Z]) - j^{Um - E[C/]) ® {Zm - E[Z]). 

2 = 1 


Then by the SLLN we get 
1 ^ 

J2{Ui - E[i7]) ® {Zi - E[Z]) E[{U - E[f7]) ® (Z - E[Z]), 

2=1 

and j^^{Um — IE[C/]) ® {Zm — E[Z]) 0 E-almost surely. Thus, we have 

Cov(C/, Z) - Coy{Um, Zm) 0 and Cov(Z) - Cov(Zm) 0 
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P-almost surely. Since the matrix inverse is a continuous mapping there also follows 

CoY-\Z) - Coy-\Zm) 0 P-a.s. 

and hence, K — Km —)• 0 as M —)■ oo P-almost surely. We thus have for p G [1,2] P-a.s. 

1 ^ 

limX]^, = C/f VzGN and lim - - C/f = 0, 

i=l 

since by the SLLN YliLi Ik “ -^*11^ will tend to Eflk — Z^Pj P-almost surely. Moreover, there holds 

(1 + \\Ut\\ + ||X^,JI)' ^ (1 + II + II^M,, - U^Wf < 2(1 + 2||C/f 11)2 + ||X^,, - C/f||2 

which yields, again by the SLLN and the above arguments, 

M M 

jg 5^(1 + ||(7“|l + WXtiM? < -g y;(l + 2||t/“||) 

i=\ i=l 

as M ^ oo P-a.s. We thus finally obtain 

1 ^ 
i=l 

proving the first statement of the theorem. The remaining three then follow immediately. □ 


" + jg EII - ixtf ^ E|(i + 2||f/“ll)"l 

i=l 

M — ¥oo ^ -p-f, 

-^ 0 P-a.s., 


4 Bayesian Interpretation of Generalized Kalman Filters 

In the previous section we have characterized the limit of the EnKF and PCKF approximations for increasing 
sample size or polynomial basis, respectively. We now investigate how this limit, the analysis variable U°‘, 
may be understood in the context of Bayesian inverse problems. By analyzing the properties of this RV we are 
able to characterize those of the approximations provided by the two Kalman filtering methods. In particular, 
we show that these do not, in general, solve the nonlinear Bayesian inverse problem, nor can they be even 
justified as approximafions fo ifs solufion. They are, rafher, relafed fo a linear approximafion of fhe Bayes 
esfimafor (jicM and ifs esfimafion error. 

4.1 The Linear Conditional Mean 

The quanfify known in classical sfafisfics as fhe besf linear unbiased esfimafor (BFUE) corresponds in fhe 
Bayesian seffing fo fhe linear posterior mean estimator (/>lcm defined as 

0LCM = argmin E [\\U - (J){Z)f] , (14) 

0eTi(R‘*;A’) 

where Tf) = {(p : 4>{z) = b + Az wifh b G X, A G X)} denotes fhe sef of all linear mappings 

from fo X. Moreover, we refer fo fhe RV (pLCuiZ) as fhe linear conditional mean. Recall fhaf fhe condi- 
fional mean <:f)cyi{Z) = E[C/|Z] is fhe besf approximation of U in L^(n, a{Z),¥-, X) w.r.f. fhe L^{X)-norm. 
Thus (t)-i,cyt{Z) can be seen as fhe besf approximation of U in fhe subspace Vi{Z;X) C it(Z), P; X), 

where X) is short for X) o Z = {4>{Z), (p G X)}. 
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Lemma 12. The linear conditional mean as defined in (14) is given by 

0lcm(^) = E [(/] + Cov((/, Z)Cov{Z)-\z - E [Z]). 

Proof. The assertion follows by verifying that 

fi{Z) = E [i7] + K{Z - E [Z]), K = Cov{U, Z)Cov(Z)-\ 

coincides with the orthogonal projection ofU to "Pi (Z; T”). To do so, we will show that U—4>{Z) is orthogonal 
to Pi(Z; X) w.r.t. the inner product in L?'{Fl, P, P; X). 

Let b € X and A £ Tf) be arbitrary. Then there holds 

E [{U - fi{Z), b + AZf = E [([/ - E[U],b)] +E[{U- E[U],AZ}] 

' -V-" 

= 0 

- E [{K(Z - E[Z]),AZ)] - E [{K{Z - E[Z]), b)] 

'-V-" 

= 0 

= E[{U- E[U],A{Z - E[Z]))] - E [{K{Z - E[Z]), ^(Z - E[Z])] 

= Cov{U, Z)A* - KCoy{Z)A* = 0, 


since 

E[{U -E[U],AE[Z])] =E[(it:(Z-E[Z]),^E[Z])] = 0 

and Cov(^X, BY) = ^Cov(X, Y)B* for Hilbert space valued RV X, Y and bounded, linear operators 
A,B. □ 

We note that Proposition 12 fails to hold in case X is only a separable Banach space, since then the 
expectation E[?7] and covariance Cov{U, Z) no longer minimize E[||(7 — 6|p], b £ X, and E[||C/ — AZ\\‘^], 
A £ T"), respectively; see also Remark 7. 

4.2 Interpretation of the Analysis Variable 

Lemma 12 immediately yields a characterization of the analysis variable defined in (12). 

Theorem 13. Let Assumptions 1, 2 and 5 be satisfied for the model (2). Then for any z £ the analysis 
variable = U + K{z — Z), K = Cov(f7, Z)Cov(Z)“^, coincides with 

u°' = fhCMiz) + {U - fi-LCuiZ)). 


In particular, there holds 

E [(/“] = fLCuiz) and Cov((/“) = Cov{U) - iTCov(Z, U). 

We summarize the consequences of Theorem 13 as follows: 

• The analysis variable [/“, to which the EnKF and the PCKF provide approximations, is the sum of 
a Bayes estimate (j)LCM{z) and the prior error U — fhCuiZ) of the corresponding Bayes estimator 

• The mean of the FnKF analysis ensemble or PCKF analysis vector provide approximations to the linear 
posterior mean estimate. How far the latter deviates from the true posterior mean depends on the model 
and observation z. 
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• The covariance approximated by the empirical covariance of the EnKF analysis ensemble, as well as 
that of the PCKF analysis vector, is independent of the actual observational data 2 ; G It therefore 
constitutes a prior rather than a posterior measure of uncertainty. 

• In particular, the randomness in is entirely determined by the prior measures /tq and Only 

the location, i.e., the mean, of [/“ is influenced by the observation data z; the randomness of [/“ is 
independent of z and determined only by the projection error U — (jihCuiZ) w.r.t. the prior measures. 

• In view of the last two items, the analysis variable C/“, and therefore the EnKF analysis ensemble or 
the result of the PCKF, are in general not distributed according to the posterior measure /r^. Moreover, 
the difference between and the distribution of depends on the data z and can become quite large 
for nonlinear problems, see Example 15. 

Remark 14. In particular the second and third item above explain the observations made in [22] that “[...] 
(i) with appropriate parameter choices, approximate filters can perform well in reproducing the mean of the 
desired probability distribution, (ii) they do not perform as well in reproducing the covariance [...] 


We illustrate the conceptual difference between the distribution of the analysis variable and the poste¬ 
rior measure p,^ with a simple yet striking example. 

Example 15. We consider U e ~ A^(0, cr^) and G{u) = u^. Given data z G M, the posterior 

measure, obtained from Bayes’ rule for the densities, is 


(du) = C exp ( — 


a'^u^ (z — 


du. 


Due to the symmetry of p^ we have ucm = = 0 for any z G M'^. Thus, E[?7|Z] = 0 and 

4>hCM = fcM- In particular, we have K = 0 due to 

Cov{U, Z) = Cov{U, U^) = [ u{u^ - = q, 

v2'k Jr 

which in turn yields [/“ = [/ ~ A^(0,1). Thus, the analysis variable is distributed according to the prior 
measure. This is not surprising as, by definition, its mean is the best linear approximation to the posterior 
mean according to p^ and its fluctuation is simply the prior estimation error U — <P-lcm{Z) = U — 0 = U. 
This illustrates that is suited for approximating the posterior mean, but not appropriate as a method for 
uncertainty quantification for the nonlinear inverse problem. As displayed in Figure 1, the distribution ofU°' 
can be markedly different from the true posterior distribution. 


5 Numerical Examples 

To illustrate the application of the EnKF and PCKF to simple Bayesian inverse problems, we consider in the 
following a one-dimensional elliptic boundary value problem and a time-dependent RFC circuit model. 

5.1 ID Elliptic Boundary Value Problem 

Fet L) = [0,1] and 


^exp(Mi) 



f{x), 


p(0)=Po, P(1)=W2, 


d 

dx 


(15) 
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u 


Figure 1: Density of the posterior {dashed, blue line) and the probability density of the analysis variable 
U°‘ {solid, red line) for 2: = 9 and a = 0.5. 


be given where u = {ui, U 2 ) are unknown parameters. The solution of (15) is 

p{x) =po + iu2 - po)x + ex.p{-ui) {Sx{F) - Si{F)x) , (16) 

where Sx{g) := Jq 9{y) dy and F{x) = S^if) = f{y) dy. For simplicity we choose / = 1, po = 0 
in the following and assume noisy measurements have been made of p at xi = 0.25 and X 2 = 0.75 with 
values z = (27.5, 79.7). We seek to infer u based on this data and on a priori information modelled by 
{UI,U2) N{0, 1) (g) Uni(90,110), where Uni(a, b) denotes the uniform distribution on the interval [a, b]. 

Thus the forward map here is G{u) = (p(xi),p(x2)), where p is given in (16) with / = 1 and po = 0. As 
the model for the measurement noise we take e ~ ^”(0, 0.01 F)- 

In Figure 2 we show the prior and posterior densities as well as 1000 ensemble members of the initial and 
analysis ensemble obtained by the EnKF. A total ensemble size of M = 10^ was chosen in order to reduce 
the sampling error to a negligible level. It can be seen, however, that the analysis EnKF-ensemble does not 
follow the posterior distribution, although its mean (—2.92,105.14) is quite close to the true posterior mean 
(—2.65,104.5) (computed by quadrature). To illustrate the difference between the distribution of the analysis 




Eigure 2: Eeft: Contour plot of the negative logarithm of the prior density and the locations of 1000 ensemble 
members of the initial EnKF-ensemble. 

Right: Contour plot of the logarithm of the negative logarithm of the posterior density and the locations of 
the updated 1, 000 ensemble members in the analysis EnKF-ensemble. 
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ensemble/variable and the true posterior distribution, we present the marginal posterior distributions of ui 
and U 2 in Figure 3. For the posterior the marginals were evaluated by quadrature, whereas for the analysis 
ensemble we show a relative frequency plot. 




Figure 3: Posterior marginals and relative frequencies in the analysis ensemble for ui (left) and U 2 (right). 

We remark that slightly changing the observational data to z = (23.8, 71.3) moves the analysis ensemble 
as well as the distribution of the analysis RV much closer to the true posterior, as shown in Figure 4. Moreover, 
for these measurement values the mean of the analysis ensemble (0.33, 94.94) provides a better fit to the true 
posterior mean (0.33, 94.94). 



Figure 4: Left: Contours of the logarithm of the negative log posterior density and locations of 1,000 members 
of the analysis EnKF-ensemble. 

Middle, Right: Posterior marginals and relative frequencies in the analysis ensemble for ui (middle) and U 2 
(right). 

To reaffirm fhe facf fhaf only fhe mean of fhe analysis variable depends on fhe acfual dafa, we show 
densify esfimafes for fhe marginals of ui and U 2 of [/“ in Figure 5 obfained from fhe observafional dafa 
2 : = (27.5,79.7) (blue lines) and 2 = (23.8,71.3) (green lines), respecfively. The density esfimafes were 
obfained by normal kernel densify estimation (KDE, in fhis case Matlab’s ksdensity routine) based on 
fhe resulting analysis ensembles (uf,U 2 } and (uf,U 2 ) for fhe dafa sefs 2 and 2, respectively. We observe 
fhaf fhe marginal disfribufions of fhe cenfered ensembles coincide, in agreemenf wifh Theorem 13. 

In addition, whenever fhe prior and fhus also fhe posferior support for U 2 is bounded - as in fhis example 
- fhe EnKE may yield members in fhe analysis ensemble which are oufside fhis support. This is a furfher 
consequence of Theorem 13: Since fhe analysis ensemble of fhe EnKE follows fhe disfribufion of fhe analysis 
variable rafher fhan fhe frue posferior disfribufion, ensemble members tying oufside fhe posferior supporf can 
always occur whenever fhe supporf of fhe analysis variable is nof a subsef of fhe supporf of fhe posferior. 
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Figure 5: Left: Kernel density estimates for uf (blue, solid line) and rif {green, dashed line). 

Middle, Right: Kernel density estimates for — E['u“] {blue, solid) and rif — E['uf] {green, dashed), i = 1,2. 

Finally, we would like to stress that, whether or not the distribution of the analysis variable is a good fit 
to the true posterior distribution depends entirely on the observed data — which can neither be controlled nor 
known a priori. 

Applying the PCKF to this simple example problem can be done analytically. We require four basic 
independent random variables ~ N{0, 1), ^2 ~ Uni(0,1), ^3 ~ N{0, 1) and ^4 ~ A^(0,1) to define PCEs 
which yield random variables distributed according to the prior and error distributions: 

U := (6) 90 + 20.^2)''' ~ bo, £ ■= (O-lc^a, 0.1^4)''' ~ i/£. 

Moreover, due to (16), G{U) is also available in closed form as 

gm) = Ai(“ + 20«2) 

(<!2i(90 + 20&) ’ 

where Hn denotes the nth normalized Hermite polynomial and cn , C12 , C21 , C22 can be deduced from inserting 
X = 0.25 and x = 0.75 into (16). Here we have used the Hermite expansion of exp(—^), see also [39, 
Example 2.2.7]. Thus, the chaos coefficient vectors of U and G{U) + s w.r.t. the polynomials 

P«(^) = F„,( 6 ) i„,( 6 ) HaAU), a E N^, 

can be obtained explicitly where and denote the normalized Hermite and Eegendre polynomials of 
degree a, respectively. In particular, the nonvanishing chaos coefficients involve only the basis polynomials 

Po(^) = l, 2 Pi( 0 =il( 6 ), P2i$) = Hi{C 3), P3{^)=Hi{a) 

and Pa{^) = Ha-siCi) for a > 4. Arranging the two-dimensional chaos coefficients of [/ and G(17) as the 
column vectors of the matrices [[/], [G((/)+e] E and denoting by [U] the matrix [ui, ri 2 ,...] E 

we get 

K = m^)]" + 0.01/2 

Thus, the only numerical error incurred in applying the PCKF in this example is the truncation of the PCE. 
We have carried out this calculation using a truncated PCE of length J = 4 + 50 according to the reduced 
basis above. In particular, we evaluated the approximation Kj to K by using the truncated vector [PjG((7)] 
in the formula above and then performed the update of the chaos coefficients according to (11). After that 
M = 10^ samples of the resulting random variable Uj were drawn, but since the empirical distributions were 
essentially indistinguishable from those obtained by the EnKE described previously, they are omitted here. 
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Remark 16. Although a detailed complexity analysis of these methods is beyond the scope of this work, we 
mention that the EnKF calls for M evaluations of the forward map G{uj), j = 1,..., M, whereas the PCKF 
requires computing the chaos coefficients of G{U) by, e.g., the stochastic Galerkin method. Thus the former 
yields, in general, many small systems to solve, whereas the latter typically requires the solution of a large 
coupled system. Moreover, we emphasize the computational savings by applying Kalman filters compared to 
a ‘full Bayesian update”, i.e., sampling from the posterior measure by MCMC methods. In particular, each 
MCMC run may require calculating many hundreds of thousands forward maps G{u), e.g., for each iteration 
Uj of the Markov chain as in the case of Metropolis-Hastings MCMC. Hence, if one is interested in only the 
posterior mean as a Bayes estimate, then EnKF and PCKF provide substantially less expensive alternatives 
to MCMC for its approximation by the linear posterior mean. 


5.2 Dynamical System: RLC circuit 


We apply the EnKF to sequential data assimilation in a simple dynamical system: a damped LC-circuit or 
RLC-circuit. Denoting the initial voltage by Uq, the resistance by R, the inductance by L and the capacitance 
by G, and assuming R < 2s/LG, the voltage and current in the circuit can be modelled as 


U{t) 




( COs{Wet) + 


5 

— sm 

We 


{Wet)), 


(17a) 


I{t) = —sin(t(;et), (17b) 

WgL 

where 5 = R/ (2L), We = s/wq — 5“^ and wo = 1/s/LG. The data assimilation setting is now as follows. We 
observe the state of the system (17) at four time points tn = 5n, n = 1,..., 4, where all observations z G 
are corrupted by measurement noise e ~ A^(0, diag(iT^,..., cr|)). Here we have chosen = 0.1|?7(f„)| 
and cr 2 „ = 0.1|/(f,i)| for n = 1,... ,4. We want to infer Uq and L based on these observations, i.e, the 
unknown is n = {Uq,L), and we take as prior {Uq,L) ~ A^(0.5,0.25) (8> Uni(l,5). Given observations 
G we compare two assimilation strategies for applying the EnKF: 


• Simultaneous: We apply the EnKF to the inverse problem 

Z = G{u) + £, 

where G maps ((/q, L) to the states (C/(ti), /(G),..., U{ti),I{t/f)) G M®. Thus, we perform one EnKF 
update using all the available data at once, resulting in one EnKF analysis ensemble. 

• Sequential: We apply the EnKF to the inverse problem 

Zn = Gniu)+£n, n=l,..., 4 , 

where Gn maps {Uq,L) to the state {U (tn), I (tn)) £ In particular, we will perform four EnKF 
updates using at each update only the corrupted data Zn = {U{tn) + £2n-i, I{tn) + £2n)- This yields, 
for each update, one EnKF analysis ensemble which, in turn, serves as the initial ensemble for the next 
update. 

Again we use two different data sets z, obtained by two realizations of e given the solution of (17) for 
Uo = 0.75, R = 0.5, L = 1.5, G = 0.5. 

The resulting posteriors and EnKF analysis ensembles for the simultaneous and sequential update are 
presented in Fig. 6 . We observe again that, for different data sets, the EnKF results in an ensemble which 

*2 = (0.505, 0.237,0.014, 0.096,0.036, 0.011,-0.002,-0.003), 2 = (0.265,0.066,0.058, 0.002,0.021,0.012,0.007, 
- 0 . 01 ) 
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Simultaneous EnKF update for data 2: Simultaneous EiiKF update for data z 



Figure 6: Contours of the logarithm of the negative log posterior density and locations of 1,000 members of 
the analysis EnKF-ensembles resulting from simultaneous and sequential updating for the two different sets 
z and 2 of observation data. 


follows a distribution which is, in one case, quite close and, in the other, quite far away from the true posterior 
distribution. Interestingly, the difference between the two updating schemes does not seem to be too large. 
This also holds true for the means of the EnKF analysis ensembles when compared to the true posterior means 
in Table 1 for both data sets z and z. 


Finally, we are again interested in the marginals of the posterior and the associated histograms of the 
EnKF analysis ensembles which give us a rough impression of the difference between the distribution of 
the analysis variable and the true posterior. In Fig. 7 we compare both marginals for the 4th update and 
the simultaneous analysis ensemble for both data sets. The distribution of the simultaneous EnKF analysis 
ensemble should not depend on the data whereas the distribution of the final EnKF analysis ensemble for the 
sequential updating clearly does in this example. This is certainly caused by the nonlinearity of the forward 
map G: in the sequential updating the former analysis variable 17“ serves as initial one for the current update 
step n + 1, therefore, the difference in the mean of the former analysis variables C/“, C/“ for different data 
sets z, z might yield different forecast RVs G{U^),G{IJ^) due to the nonlinearity of G which yields different 
next analysis variables Un+i- 
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update 

EnKF mean 

for data z 

posterior mean 
for data z 

EnKF mean 

for data z 

posterior mean 
for data z 

1 

(0.42, 1.56) 

(0.42, 2.42) 

(0.27, 2.25) 

(0.35, 2.61) 

2 

(0.44, 1.53) 

(0.39, 2.36) 

(0.20, 2.20) 

(0.32, 2.56) 

3 

(0.43, 1.59) 

(0.38, 2.34) 

(0.19, 2.26) 

(0.31,2.52) 

4 

(0.43, 1.59) 

(0.38, 2.32) 

(0.19, 2.24) 

(0.30, 2.50) 

Simu. 

(0.58, 1.84) 

(0.38, 2.32) 

(0.38, 2.40) 

(0.30, 2.50) 


Table 1: Means of the EnKF analysis ensembles and corresponding true posterior means for data z (left) and 
2 (right). 


Simulataneous EnKF update for data z 




Simulataneous EnKF update for data z 





Figure 7: Posterior marginals and relative frequencies of the final EnKF analysis ensembles in ui, U 2 for z 
(left part) and z (right part). 


6 Conclusions 

We have given a detailed analysis of two popular generalized Kalman filtering methods, the EnKF and PCKF, 
applied to nonlinear (stationary) Bayesian inverse problems. We recalled the Bayesian approach to inverse 
problems and its solution, the posterior measure, in a Hilbert space setting, for which we slightly generalized 
existing results concerning the well-posedness of Bayesian inverse problems. Further, in order to characterize 
Kalman hlter methods in the Bayesian framework, we also described Bayes estimators and highlighted the 
distinction between the two objectives of inference and identification in Bayesian inversion realized by the 
posterior measure and Bayes estimators, respectively. We then proved the convergence of the approximations 
provided by the EnKF and PCKF to a so-called analysis random variable in the large ensemble and large 
polynomial basis limit, respectively, reaffirming the fact that both methods are merely different numerical 
discretizations of the same updating scheme for random variables. Moreover, the relation of both Kalman 
filter methods to a specific Bayes esfimator, the linear posterior mean estimator, followed from this. Hence, 
this work shows that the EnKF and PCKF are methods suited for identification - providing in addition the 
random a priori estimation error - rather than methods for rigorous inference in the sense of (regular versions 
of) the conditional measure. Several carefully chosen numerical examples were given to illustrate these basic 
differences. 
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